Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 1, 2025

This PR implements a persistent retry queue for telemetry events that fail to send to Roo Code Cloud, as requested in #4940.

Changes

  • TelemetryQueue class: Implements a FIFO queue with persistence using VSCode globalState
  • Queue integration: Modified TelemetryClient to enqueue events instead of sending directly
  • Automatic retry: Failed events are retried up to 3 times before being discarded
  • Background processing: Queue is processed asynchronously when new events are added
  • Size limits: Queue is limited to 1000 events to prevent unbounded growth

Key Features

  • ✅ Events persist across VSCode restarts
  • ✅ FIFO ordering ensures events are sent in the correct sequence
  • ✅ Failed events are moved to the end of the queue with incremented retry count
  • ✅ Only removes events after successful confirmation from the cloud service
  • ✅ Comprehensive test coverage for all queue operations

Testing

  • Added TelemetryQueue.test.ts with tests for all queue operations
  • Added TelemetryClient.queue.test.ts for integration testing
  • Updated existing TelemetryClient tests to work with the new queue-based approach
  • All tests pass successfully

Fixes #4940


Important

Introduces a persistent retry queue for telemetry events in TelemetryClient, ensuring failed events are retried and not lost, with extensive test coverage.

  • Behavior:
    • Introduces TelemetryQueue class for persistent FIFO queue using VSCode globalState.
    • TelemetryClient now enqueues events instead of sending directly, with automatic retries up to 3 times.
    • Asynchronous background processing of the queue when new events are added.
    • Queue size limited to 1000 events to prevent unbounded growth.
  • Integration:
    • TelemetryClient modified to use TelemetryQueue for event handling.
    • Events persist across VSCode restarts and are sent in FIFO order.
    • Failed events are moved to the end of the queue with incremented retry count.
    • Events removed only after successful confirmation from the cloud service.
  • Testing:
    • Added TelemetryQueue.test.ts for queue operations.
    • Added TelemetryClient.queue.test.ts for integration testing.
    • Updated existing TelemetryClient tests for queue-based approach.
    • Comprehensive test coverage for all queue operations.

This description was created by Ellipsis for a629172. You can customize this summary. It will automatically update as commits are pushed.

- Implement TelemetryQueue class with FIFO queue structure
- Add persistence using VSCode globalState
- Integrate queue into TelemetryClient for automatic retry
- Process pending events when new events are captured
- Add comprehensive test coverage for queue functionality
- Limit queue size to 1000 events and max 3 retries per event

Fixes #4940
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 1, 2025 21:50
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Aug 1, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing my own code is like debugging in a mirror - everything looks backwards but the bugs are still mine.

await this.queue.enqueue(result.data)

// Process queue asynchronously if not already processing
if (!this.queue.isProcessingQueue()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional? There's a potential race condition here between checking isProcessingQueue() and actually setting the processing state in processQueue(). Multiple simultaneous captures could trigger multiple queue processors.

Could we consider moving the processing state flag setting immediately after the check, or using a more atomic approach?

})

// Check if response indicates success (fetch doesn't throw on HTTP errors)
if (response === undefined || (response && response.ok !== false)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition seems a bit convoluted. Would this be clearer?

Suggested change
if (response === undefined || (response && response.ok !== false)) {
// Check if response indicates success
if (response && response.ok) {

The current logic treats undefined as success, which might be confusing for future maintainers.


export class TelemetryQueue {
private static readonly QUEUE_KEY = "rooCode.telemetryQueue"
private static readonly MAX_QUEUE_SIZE = 1000 // Prevent unbounded growth
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we consider making these limits configurable? Different deployment scenarios might benefit from different queue sizes and retry counts. Perhaps through extension settings or environment variables?

* Processes the telemetry queue, sending events to the cloud service
*/
private async processQueue(): Promise<void> {
if (!this.authService.isAuthenticated()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice the queue doesn't automatically process when authentication state changes. If a user was offline and queued events, then comes online and authenticates, they'd need to trigger a new event before the queue processes. Should we consider adding a listener for auth state changes to process the queue?

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 1, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 2, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 2, 2025
@daniel-lxs daniel-lxs marked this pull request as draft August 5, 2025 20:48
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Aug 5, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Aug 6, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Draft / In Progress size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add persistent retry queue for failed telemetry events to Roo Code Cloud

3 participants